Logo
Bioinformatics Service

Contents

  • 1 Data Description
  • 2 Analysis Workflow
  • 3 General Stats
  • 4 Metadata Exploration
    • 4.1 Cell Counts
    • 4.2 Total UMIs and Genes Detected in each cell
    • 4.3 UMI and Gene Detected Per Sample
  • 5 Quality Control
    • 5.1 Per Cell Quality Checking
    • 5.2 Low Quality Cells Filtering
  • 6 Normalization
  • 7 Feature Selection
  • 8 Clustering
    • 8.1 Cluster Identification
    • 8.2 Similarity in sample clusters
    • 8.3 Cluster visualization with UMAP
  • 9 Cell Type Annotation
    • 9.1 Assigned Cell Type
    • 9.2 UMAP visualization of Celltype
  • 10 Software Catalog

Single-cell RNA Sequecing Basic Report

Published

August 14, 2024

1 Data Description

This is a subset of the Human Cell Atlas Bone Marrow Big Dataset, including 380,000 cells from 8 Donors. DataSet

2 Analysis Workflow

The basic analysis workflow will follow these main steps:

flowchart TD
    A((scRNA-seq Data)) --> B[Exploratory Data Analysis]
    A --> C[Quality Control]
    C --> D[Normalization]
    B --> E[Feature Selection]
    D --> E
    E --> F[Dimensionality Reduction]
    F --> G[Clustering]
    G --> H[Automated Celltype Annotation]

    %% Enhanced styling with lighter colors and distinct shapes
    style A fill:#f0f0f0,stroke:#888,stroke-width:1px,stroke-dasharray: 5 5;
    style B fill:#cfe,stroke:#888,stroke-width:1px;
    style C fill:#cfe,stroke:#888,stroke-width:1px;
    style D fill:#cfe,stroke:#888,stroke-width:1px;
    style E fill:#e7f7ff,stroke:#888,stroke-width:1px;
    style F fill:#e7f7ff,stroke:#888,stroke-width:1px;
    style G fill:#e7f7ff,stroke:#888,stroke-width:1px;
    style H fill:#f0f0f0,stroke:#888,stroke-width:1px,stroke-dasharray: 5 5;

3 General Stats

Note
  • nCell: Number of cell in each sample
  • mean_UMI: Mean of total UMI counts per sample
  • mean_Gene: Mean of the total genes detected per sample
  • mean_Mito: Mean of the UMI counts that belong to mitochondria gene per sample
  • mean_Mito_percent: Mean percentage of mitochondria UMI per sample

4 Metadata Exploration

4.1 Cell Counts

4.2 Total UMIs and Genes Detected in each cell

  • UMI
  • Gene

4.3 UMI and Gene Detected Per Sample

  • UMI
  • Gene

5 Quality Control

5.1 Per Cell Quality Checking

This step uses the automated quality control function quickPerCellQC. It calculates median absolute deviation (MAD) thresholds to identify outliers in these metrics then flags those outliers for discarding process.

  • Total Counts
  • Detected Features
  • Mito percent

Note
  • Total Counts (Library size): Cells with a library size below a certain threshold (e.g., 3 MADs below the median) are flagged.
  • Detected Feature (Genes): Cells with a low number of detected features (e.g., 3 MADs below the median) are flagged.
  • Mito Percent (Mitochondria Percent): Cells with a high percentage of mitochondrial reads (e.g., 3 MADs above the median) are flagged.

5.2 Low Quality Cells Filtering

These low-quality cells marked above will be discarded.

6 Normalization

This step leverage the function logNormCounts to normalizes the single-cell RNA-seq data by dividing each cell’s counts by its total count (library size) and then applying a log-transformation. This process adjusts for differences in sequencing depth across cells and prepares the data for downstream analysis.

7 Feature Selection

This code performs feature selection in single-cell RNA-seq analysis by identifying highly variable genes (HVGs). First, modelGeneVarByPoisson models the gene expression variance across cells, accounting for donor-specific effects to isolate biologically relevant variability. Then, getTopHVGs selects the top 5,000 most variable genes, which are likely to be biologically significant and are used in downstream analyses (clustering and dimensionality reduction).

Top 5000 HVGs

8 Clustering

  • This step first runs UMAP on the dataset using MNN-based dimensionality reduction and Annoy for efficient nearest neighbor calculations, optimized for large datasets.
  • Next, it performs two-step clustering: first applying K-means to create 1,000 initial clusters, then refining them with a nearest-neighbor graph using k=5.

8.1 Cluster Identification

Here is the clustering result:

8.2 Similarity in sample clusters

The distribution of cells across clusters and donors are show below, it provides a visual summary of how different donors contribute to each cluster.

8.3 Cluster visualization with UMAP

Visualize the clusters in UMAP plot.

  • Cluster
  • Scrambling

9 Cell Type Annotation

9.1 Assigned Cell Type

This step performs automated cell type classification using a reference dataset to annotate each cluster based on its pseudo-bulk profile.

Note

Reference dataset: HumanPrimaryCellAtlasData This reference dataset provides normalized expression values for 713 microarray samples from the Human Primary Cell Atlas (HPCA) (Mabbott et al., 2013). These 713 samples were processed and normalized as described in Aran, Looney and Liu et al. (2019).

9.2 UMAP visualization of Celltype

Visualize the assigned cell-type in UMAP plot.

10 Software Catalog

Footer Example
Precigene